UNCLASSIFIED 


CONFIDENCE  SETS  IN  CHANGE-POINT  PROBLEMS  REVISIONS) 
STANFORD  UN IV  CA  DEPT  OF  STATISTICS  D  SIEGMUND  AAV  87 
TR-2  N08814-87-K-8878 


■ 

- 1 

f  NV 

1.  //:■ 

AD-A181  075 


% 


0 


I 


DTK  FILE  COP.:, 

CONFIDENCE  SETS  IN  CHANGE-POINT  PROBLEMS 


by 

David  Siegmund 
Stanford  University 


TECHNICAL  REPORT  NO.  2 
MAY  1987 


DTIC 

ELECTE 


PREPARED  UNDER  CONTRACT 
N00014-87-K-0078  (NR-042-373) 

FOR  THE  OFFICE  OF  NAVAL  RESEARCH 


Reproduction  in  Whole  or  in  Part  is  Permitted 
for  any  Purpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


l 


H7 


CONFIDENCE  SETS  IN  CHANGE-POINT  PROBLEMS 


by 


David  Siegmund 
Stanford  University 


TECHNICAL  REPORT  NO.  2 
MAY  1987 

PREPARED  UNDER  CONTRACT 
N00014-87-K-0078  (NR-042-373) 

FOR  THE  OFFICE  OF  NAVAL  RESEARCH 


Reproduction  in  Whole  or  in  Part  is  Permitted 
for  any  Purpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD.  CALIFORNIA 


■i  ■'  for 


CRA'I 

tah 


J  *  •  ! *  ■ 


This  is  a  substantial  revision  of  Technical  Report  No.  39  of  ONR  Contract  N00014-77- 
0306,  entitled  “Confidence  Sets  for  a  Change-point.” 


CONFIDENCE  SETS  IN  CHANGE-POINT  PROBLEMS 


David  Siegmund 

Department  of  Statistics,  Sequoia  Hall,  Stanford  University,  Stanford,  California  94305,  USA 

Summary. 

\ 

Several  methods  are  discussed  for  confidence  set  estimation  of  a  change-point  in  a  se¬ 
quence  of  independent  observations  from  completely  specified  distributions.  The  method  based 
on  the  likelihood  ratio  statistic  is  extended  to  the  case  of  independent  observations  from  an 
exponential  family.  Joint  confidence  sets  for  the  change-point  and  the  parameters  of  the  ex¬ 
ponential  family  are  also  considered. 

Key  words:  change-point,  likelihood  ratio,  boundary  crossing  probabilities 

1.  Introduction. 

Let  xi,x2,...,xm  be  independent  random  variables  with  ii,...,Xj  having  distribution 
F  and  x7+1,...,xm  having  distribution  G  F.  The  change-point  j,  where  the  distribution 
shifts  from  F  to  G,  is  an  unknown  parameter,  to  be  estimated  by  a  confidence  set.  In  general, 
the  distributions  F  and  G  may  be  known,  completely  unknown,  or  specified  up  to  an  unknown 
parameter.  In  this  paper  I  discuss  several  procedures  for  the  artificial  but  informative  case 
of  completely  specified  F  and  G,  and  then  develop  more  completely  a  method  based  on  the 
likelihood  ratio  statistic  for  the  case  where  F  and  G  come  from  a  common  exponential  family  of 
distributions.  Precedent  for  the  approach  taken  here  is  found  in  Worsley  ( 1986)  and  Siegmund 
( 1986). 

A  distinguishing  feature  of  change-point  problems  is  that  the  likelihood  function  is  not 
smooth,  even  (or  perhaps  especially)  if  the  process  evolves  in  continuous  times.  Hence  there  is 
no  reason  to  expect  maximum  likelihood,  likelihood  ratio,  and  Bayes  estimates  from  different 
prior  distributions  to  lead  to  asymptotically  equivalent  results.  In  fact,  confidence  sets  based 
directly  on  the  maximum  likelihood  pstimator  arp  demonstrably  inferior  to  those  obtained  by 


other  methods.  See  Siegmund  (1986)  and  Ibragimov  and  Khasminski  (1981)  for  related  results 
in  the  context  of  detecting  a  change  in  the  drift  of  Brownian  motion. 


Section  2  is  concerned  with  known  F  and  G.  In  addition  it  is  assumed  that  the  sequence 

of  observations  is  actually  doubly  infinite,. .  .x_i,xo,*i, _  This  additional  assumption  has 

little  effect  if  m  is  large  and  it  is  known  that  j  is  not  close  to  1  nor  to  m,  because  observations 
far  from  the  change-point  carry  little  information  about  the  location  of  the  change-point. 
The  virtue  of  the  assumption  is  that  it  makes  j  into  a  location  parameter  and  provides  an 
exact  ancillary  statistic:  the  class  of  shift  invariant  events.  Five  confidence  set  estimates  are 
discussed.  Three  are  studied  by  Siegmund  (1986),  in  the  context  of  estimating  a  change-point 
in  the  drift  of  Brownian  motion.  The  fourth  is  essentially  the  suggestion  of  Cobb  (1978), 
and  the  fifth  has  smallest  expected  size  among  all  shift  invariant  confidence  sets.  Section  3 
compares  the  different  confidence  sets. 

Sections  4  and  5  are  concerned  with  the  case  that  F  and  G  are  imbedded  in  a  common 
exponential  family,  whose  parameter  9  is  unknown.  Section  4  develops  a  method  based  on 
the  likelihood  ratio  statistic  for  obtaining  exact  confidence  sets  for  j.  A  new,  fairly  simple 
approximation  is  suggested  for  the  required  probability  calculation.  The  approximation  is 
illustrated  on  the  coal  mining  accident  data  along  the  lines  discussed  by  Worsley  (1986).  In 
Section  5  the  likelihood  ratio  method  is  extended  to  give  a  joint  confidence  set  for  j  and 
a  function  of  the  parameters  of  the  exponential  family.  Technical  results  are  given  in  two 
appendices. 

2.  The  Cases  of  Known  F  and  G- 

Let  Z  denote  the  integers  and  let  Z  .  Let  xn,  nt  Z  be  a  sequence  of  independent 
random  variables  with  x„  having  the  distribution  function  F  or  G  according  as  n  <  ;  or 
n  >  j  The  distributions  F  and  G  are  assumed  known:  the  change-point  j  is  unknown  Lei 
Pj  denote  the  probability  measure  induced  by  this  model  on  the  space  of  infinite  sequences 
=  ( xn,  nt  Z  ).  Let  a  denote  the  shift  operator,  i.e.,  the  mapping  which  takes  =  ( / „ ,  m  Z  i 
into  =  (xn+i ,  ru  Z  ).  Note  that  the  family  { Pj,  j*  Z  }  is  a  translation  fanulv  in  the  sense 
that  for  any  event  B  and  j(  Z 


mum a 


Pj(B)  =  Pj(u(B)  =  P0((t-}u(B)  =  Pq{o>B). 


Let  z„  =  log {dG(xn)/dF(xn)}  denote  the  log  likelihood  ratio  of  zn,  and  put 

Sn  =  z\  +  •  •  •  +  *»  (n  >  1) 

=  -(*n+i  +  •  ••  +  *o)  («<-l) 

=  0  (n  =  0) 

Let  li  =  dPi/dPo  denote  the  likelihood  function  at  i.  By  considering  the  finite  sequence 
xn,  -N  <  n  <  N,  and  then  letting  N  -»  oo,  one  can  easily  show  that  l,  =  exp(S,).  Under  P0 
the  log  likelihood  process  (5n,  ntZ)  is  a  random  walk  satisfying  Sq  =  0  and  having  increments 
5n  -  5n_ i  with  mean  /  \og(dG/dF )dF  <  0  for  n  >  0  and  f  lo%(dF/dG)dF  >  0  for  n  <  0. 

The  maximum  likelihood  estimator  for  j  is  the  value  j  where  the  process  (Sn,  nt  Z  ) 
assumes  its  maximum  value.  In  general  this  value  need  not  be  unique,  but  to  avoid  technicalities 
it  is  assumed  to  be  so  in  what  follows.  In  the  space  of  the  sufficient  statistic  (Sn,  nt  Z  ),  the 
sequence  V,  =  S-+t  -  Sj,  it  Z  ,  is  ancillary. 

In  the  context  of  estimating  a  change-point  in  the  drift  of  a  Brownian  motion  process, 
Siegmund  (1986)  compares  the  following  three  confidence  sets  for  the  change-point  j.  The 
first  two  were  discussed  earlier  by  Hinkley  (1970,  1972),  who,  however,  made  no  attempt  to 
establish  their  relative  efficiency. 

(i)  Since  j  -  j  is  pivotal,  if  r  =  ra  is  defined  by  P0(|./l  >  r)  =  a,  then  C i  =  [j  -  r.j  +  r] 
is  a  ( 1  -  a)  100%  confidence  interval. 

(ii)  Let  .4;  devote  the  acceptance  region  of  a  size  a  likelihood  ratio  test  of  the  hypothesis 
that  the  change-point  is  j .  i.e.,  .4,  =  {maxn  5„  -  S,  <  q}.  where  q  =  q„  satisfies  P, I  4^1  = 
(  P0(  raaxn>o  Sn  <  q ) } J  =  1  -  «  Then  the  set  (  t  of  nt  Z  such  that  the  observed  sample  point 
-j#.4n  is  a  (  1  -  a)  100%  confidence  set.  Since  the  log  likelihood  process  i  Sn.  nt  Z  i  is  in  general 
multimodal,  this  confidence  set  is  not  in  general  an  interval 

( iii )  A  modification  of  the  preceding  method  which  always  yields  an  interval  is  to  define 


L(R)  =  min(max)  jn  :  S„  >  ma xSi  -  i/'|, 
which  for  suitable  rj'  <  T)  satisfies 


Pj(L  <  j  <  R)  =  Po(L  <  0  <  R)  =  1  -  2P0(R  <  0)  =  1  -  a. 

The  next  possibility  is  essentially  the  suggestion  of  Cobb  (1978).  In  analogy  with  Fisher’s 
(1934)  observation  that  the  conditional  probability  density  of  the  maximum  likelihood  estima¬ 
tor  of  a  location  parameter  given  the  sample  spacings,  which  are  ancillary  in  that  case,  is  the 
normalized  likelihood  function,  one  may  show  by  a  direct  calculation  that 

Pj{j  -  j  =  n\Y„i(Z  )  =  P0(j  =  n\Yi,UM)  =  exp  /^exp(S,),  (1) 

I 

where  jo*,  denotes  the  observed  value  of  j.  Let 

Pn  =  exp(5n)/^exp(5,),  ne  Z  .  (2) 

(iv)  It  follows  from  (1)  that  a  confidence  set  of  conditional  coverage  probability  1  -  a  can 
be  formed  as  follows.  Order  the  pn  in  (2)  as  pj jj  >  p(2j  >  . . ..  Construct  the  set  C*  by  putting 
the  index  n\  corresponding  to  p^j  in  and  continuing  to  add  points  n2, . . . ,  n*  corresponding 
to  p(2), . . . ,  p(t)  as  long  as  P(«)  <1-0.  Note  that  for  a  Bayesian  with  a  uniform  prior  on 
Z 


Pn  =  P(j  =  n|x,,  K  Z  ) 

and  hence  the  set  C«  is  a  highest  posterior  probability  credible  set  for  j.  In  fact,  even  without 
the  explicit  evaluation  in  ( 1 ),  one  knows  from  a  general  theorem  of  Stein  ( 1965)  and  Hora  and 
Buehler  ( 1966)  that  the  highhest  posterior  credible  set  for  ]  is  also  a  confidence  set. 

(v)  One  can  also  obtain  an  unconditional  confidence  set  from  the  formal  posterior  prob 


abilities  (p„,n«  Z  )  in  (2)  as  follows:  let  r  be  such  that 


(3) 


Pj{Pj  >c}  =  />0|^exp(5n)  <  c  1 1  =  1  -  a, 

and  C$  =  {n  :  pn  >  c}.  Then  C5  is  a  (1  —  a)  100%  confidence  set,  which  according  to  a  general 
theorem  of  Hooper  (1982)  or  alternatively  by  a  simple  Neyman-Pearson  argument  has  smallest 
expected  size  among  all  shift  equivariant  confidence  sets. 


Remarks.  The  confidence  sets  (ii),  (iv),  and  (v)  all  order  the  parameter  values  for  inclusion 
according  to  the  value  of  the  likelihood  function.  Where  they  disagree  is  where  to  draw  the 
line  between  inclusion  and  exclusion.  For  those  who  strongly  prefer  a  confidence  interval  to  a 
possibly  disconnected  confidence  set,  (iii)  appears  to  be  a  reasonable  modification  of  (ii).  It  is 
possible  to  give  analogous  modifications  of  (iv)  and  (v). 

Of  these  five  confidence  sets,  all  except  for  (iv)  require  computation  of  a  sampling  distri¬ 
bution.  Approximations  are  suggested  in  the  following  section. 

3.  Comparisons. 

The  purpose  of  this  section  is  to  compare  the  expected  size  of  the  various  confidence  sets 
proposed  in  Section  2.  Since  the  case  of  known  G  and  F  is  artificially  simple  and  our  main  goal 
is  insight  into  the  case  where  G  and  F  contain  unknown  nuisance  parameters,  there  seems  to 
be  little  harm  in  simplifying  the  technical  problems  somewhat  by  assuming  that  F  is  jV(0,  1) 
and  G  is  N(6, 1)  for  a  known  S  >  0. 

Siegmund  ( 1986)  considers  the  computationally  simpler  case  of  a  Brownian  motion  process 
and  shows  that  the  length  of  the  confidence  interval  defined  in  (i)  is  substantially  longer  than 
the  expected  size  of  the  confidence  sets  in  (ii)  and  (iii). 

In  the  present  context  it  can  be  shown  as  a  — ■  0  that  the  expected  sizes  of  the  confidence 
sets  in  (ii)  -  (v)  are  all  ~  logo-1,  whereas  the  length  of  the  interval  in  ( i )  is  ~  86~2  logo-1. 
Hence  the  confidence  interval  C\  defined  in  (i)  appears  not  to  be  competitive  with  the  others 
and  will  not  be  considered  further. 


i 

t 

Although  Siegmund’s  (1986)  comparison  of  (ii)  and  (iii)  favors  (ii),  the  difference  is  not 
large.  In  fact  there  is  a  transcription  error  in  passing  from  the  first  to  the  second  line  of  the 
display  following  (3.15)  of  Siegmund  (1986),  and  consequently  the  difference  in  the  numerical 
example  between  methods  (ii)  and  (iii)  is  smaller  than  stated  there.  Since  one  suspects  that 
the  rapid  fluctuations  of  Brownian  motion  may  account  for  some  of  that  difference,  and  since 

(iii)  is  the  only  remaining  interval  estimate  and  is  a  surrogate  for  interval  modifications  of  (iv) 
and  (v),  it  seems  reasonable  to  make  a  comparison  of  (ii)  and  (iii)  in  the  present  discrete  time 
setting.  Theorem  1  below  gives  asymptotic  expansions  as  a  — *•  0  of  the  expected  size  of  the 
confidence  sets  (ii)  and  (iii). 

It  seems  difficult  to  give  comparably  precise  expansions  for  (iv)  and  (v).  Hence  (ii), 

(iv) ,  and  (v)  are  compared  below  in  a  Monte  Carlo  experiment,  which  also  shows  that  the 
approximations  given  in  Theorem  1  are  reasonably  accurate. 

We  begin  with  approximations  for  the  coverage  probability  of  (ii)  and  (iii).  Let  $  be  the 
standard  normal  distribution  function  and 


i/(x)  =  2x  2 exp  | -2 j  (x  >  0). 


(4) 


For  computational  purposes  it  usually  suffices  to  use  the  small  x  approximation  (Siegmund, 
1985,  p.  219) 


i/(x)  =  exp(-px)  +  o(x2)  (x  ►  0),  (5) 

where  p  3S  .583.  For  the  normally  distributed  xn,  nc  2L  ,  under  consideration  here  5n  = 
S(n6/2  —  5„),n  =  0,1,...,  where  5„  =  Xj  +  ...  +  xn.  It  follows  from  a  classical  result  of 
Cramer  (cf.  Siegmund,  1985,  (8.49))  that 


Pol  max  Sn  >  r]  1  ~  i/(6)exp(-q)  (q—  oo) 


and  hence  by  (5)  for  A;  defined  in  (ii)  above 


(6) 


pAAj)  -  {1  -  exp(-p  -  p6)}2. 


(7) 


By  conditioning  on  maxn>o  Sn,  one  may  show  for  R  defined  in  (iii), 


jC 


Pq(R  <  0)  =  Po  max  Sn  >  max  Sn  +  q' 

\  n<0  n>0 


~  i/(^)exp(-T/')Pos  exp  (  -  maxSn 


(8) 


t}'  — »  oo.  It  is  possible  to  compute  the  expectation  on  the  right  hand  side  of  (8)  numerically  or 
give  a  small  6  expansion  analogous  to  (5),  but  for  our  purposes  it  seems  adequate  to  pretend 
that  (6)  is  an  equality,  which  after  an  integration  by  parts  in  (8)  leads  to  the  approximation 


Pj{ 0  4[L,  fZ])  =  2exp(-q'  -  p<5){l  -  exp(-p£)/2}. 


O) 


The  following  theorem  gives  an  asymptotic  expansion  as  a  — *■  0  of  the  expected  size  of 
Ci  defined  in  (ii)  and  [Z,  R]  defined  in  (iii).  It  will  be  convenient  to  use  the  notation  |_2/J  = 
integer  part  of  y,  |C|  =  number  of  elements  in  the  set  C,  and  M  =  supn>0  Sn. 

Theorem  1.  Let  Ci  be  the  confidence  set  defined  in  (ii)  and  [L,  i2]  the  confidence  interval 
defined  in  (iii).  As  rj  — ►  oo 

Ej\C2\  =  2[2t,/62\  +  A/62 

-  4 6-1  I  {2P0(M  >  x)  -  P02(M  >  x)}dx  +  o(l), 

Jo 

and  as  r/  — *  oo 

Ej(R-L)  =  2L2t?7^J  +4/62 


■77 

A  proof  is  sketched  in  an  appendix. 


-4 6  1  I  I  P0(M(dy){2Po(M  >  x  +  y)  -  P$(M  >  x  +  y)}dx  +  o{\). 
o 


To  obtain  easily  evaluated  approximations  to  the  integrals  appearing  in  these  expressions, 
one  may  again  pretend  that  (6)  is  an  equality  and  use  (5).  This  leads  to 


Ej\C2\  “  2\2t)/62\  +  26~2(2  -  4e~ps  +  e~2ps) 


(10) 


and 

Ej(R  -  L)  =  2[2tj'/62\  +  26~2(2  -  4e~ps  +  3e~2ps  -  2e~3ps/3).  (11) 

Table  1  contains  some  numerical  examples.  It  indicates  that  there  is  essentially  no  dif¬ 
ference  between  the  expected  size  of  the  confidence  sets  (ii)  and  (iii).  On  the  basis  of  these 
results  a  statistician  who  strongly  prefers  a  confidence  interval  to  the  generally  disconnected 
likelihood  ratio  confidence  set  should  feel  comfortable  in  imposing  that  constraint. 

Table  1. 

Expected  Size  of  Confidence  Sets  (ii)  and  (iii) 


a 

6 

*7(7) 

E0\C2\  (10) 

7/(9) 

Eo(R-L)  (11) 

.1 

0.7 

2.56 

19.1 

2.18 

17.9 

.1 

1.0 

2.39 

8.2 

2.08 

9.2 

.05 

0.7 

3.27 

25.1 

2.88 

23.9 

.05 

1.0 

3.09 

12.2 

2.78 

11.2 

.01 

0.7 

4.89 

37.1 

4.49 

37.9 

.01 

1.0 

4.71 

18.2 

4.39 

17.2 

In  the  present  context  of  completely  specified  distributions  there  is  no  sampling  theory  to 
develop  in  order  to  use  the  confidence  set  (iv).  However,  it  seems  a  difficult  problem  to  give 
a  reasonable  approximation  for  the  related  set  defined  in  (v).  A  crude  approximation  to  (3) 
which  might  be  used  as  the  first  step  in  an  iterative  numerical  or  Monte  Carlo  scheme  is  to 
replace  5n  by  a  Brownian  motion  process  W(t)  with  drift  -(S2 /2)sgn(t)  and  variance  6 2  and 
replace  the  sum  in  (3)  by  an  integral.  One  easily  sees  that  the  integral  over  [0,oo)  has  the 


B 


t: 


distribution  given  by  Poliak  and  Siegmund  (1985,  Proposition  3).  This  can  be  convolved  with 
itself  to  obtain  prf/f^  exp{W(t)}dt  <  c"1]  =  2 6~ly/c  exp(—4c/62)K\(26~ly/c),  where  R\  is 
the  modified  Bessel  function  of  the  second  kind. 

Table  2  reports  the  results  of  1000  repetition  Monte  Carlo  experiment  with  m  =  100 
and  j  =  50  to  compare  the  confidence  sets  C2,C4,  and  C5.  It  confirms  that  the  analytic 
approximation  for  the  expected  size  of  C2  given  in  Theorem  1  is  reasonably  accurate  and 
shows  that  all  three  confidence  sets  have  about  the  same  expected  size. 

Table  2. 

Monte  Carlo  Comparison  of  C2,  C4,  and  C5 


a  (nominal) 

c2 

c4 

c5 

S 

d 

£o|C2| 

a 

£o|C4| 

c 

a 

£o|C5| 

.10 

.07 

.090 

18.8 

.084 

19.5 

.010 

.092 

19.3 

.10 

1.0 

.098 

9.6 

.085 

10.3 

.022 

.113 

9.4 

.05 

0.7 

.041 

24.6 

.040 

25.2 

.005 

.047 

26.0 

.05 

1.0 

.048 

12.6 

.037 

13.2 

.011 

.052 

12.6 

I 


Although  the  confidence  sets  defined  in  (ii)-(iv)  perform  similarly  on  the  average,  they  can 
treat  individual  sets  of  data  differently.  Figure  1  displays  two  simulated  log  likelihoods  with 
m  =  101,  j  =  50,  and  S  =  0.7.  The  horizontal  line  defines  the  95%  likelihood  ratio  confidence 
set  (ii).  In  accordance  with  the  approximation  (7)  it  is  drawn  3.27  units  below  the  maximum 
of  the  log  likelihood  function. 


In  the  upper  part  of  Figure  1  the  one  major  peak  of  the  log  likelihood  is  fairly  sharp 
with  the  consequence  that  all  the  confidence  sets  are  about  one  half  their  expected  size  of  25. 
The  confidence  interval  defined  in  (iii)  has  one  point  less  on  each  end  than  the  likelihood  ratio 
confidence  set.  The  formal  Bayes  posterior  set,  C4,  makes  a  smaller  adaptation  to  the  peaked 
log  likelihood;  it  contains  four  more  points,  including  the  local  maximum  at  63.  The  confidence 
set  C$  is  the  same  as  the  likelihood  ratio  confidence  set. 


The  lower  part  of  Figure  1  contains  a  comparatively  flat  log  likelihood  with  two  distinct 
peaks.  The  likelihood  ratio  confidence  set  contains  33  points.  The  interval  modification  is  now 
slightly  larger  because  it  contains  points  of  relatively  low  likelihood:  44,  45,  56-58.  Again  the 
formal  Bayes  posterior  set  adapts  less  to  the  departure  of  the  log  likelihood  from  its  expected 
shape  and  this  time  contains  four  fewer  points  than  the  likelihood  ratio  confidence  set. 

In  general,  the  interval  modification  (iii)  is  usually  slightly  shorter  than  the  likelihood 
ratio  confidence  set  but  can  be  considerably  larger.  The  formal  Bayes  posterior  set  is  usually 
larger  than  the  likelihood  ratio  set  when  both  sets  are  small  and  smaller  when  both  sets  are 
large.  This  suggests  that  there  may  be  recognizable  subsets  making  the  conditional  coverage 
probability  of  the  likelihood  ratio  set  differ  from  its  nominal  value.  The  confidence  set  C5 
can  look  rather  foolish  condi tionally.  If  all  the  p,  are  very  small  and  about  equal,  it  can 
deliver  a  small,  or  perhaps  empty  confidence  set  while  the  other  methods  recognize  the  data  as 
uninformative  and  yield  large  confidence  sets.  Presumably  this  occurs  with  small  probability. 

Overall  the  evidence  given  here  does  not  seem  persuasive  for  choosing  among  the  confi¬ 
dence  sets  (ii)  -  (v).  A  possible  conclusion  is  that  in  more  complex  problems  one  may  reasonably 
use  whichever  method  seems  most  easily  adpated  to  the  problem  at  hand.  When  the  distri¬ 
butions  F  and  G  are  unknown,  but  can  be  imbedded  in  a  common  exponential  family,  one 
can  use  a  conditioning  argument  to  obtain  exact  likelihood  ratio  confidence  sets.  This  is  the 
subject  of  the  next  section. 

4.  The  Likelihood  Ratio  Method  for  an  Exponential  Family. 

Now  suppose  that  F  and  G  can  be  imbedded  in  an  exponential  family  of  the  form 

dFe(x)  =  exp{0z  -  ip(9)}dF0(x) 

relative  to  some  fixed  distribution  Fo,  which  without  loss  of  generality  can  be  standardized 
to  have  mean  0  and  variance  1.  Thus  for  some  unknown  90  ^  9\  and  je{l, . . .,  m},  zi, . . . ,  x3 
have  distribution  F$0  and  zJ+i,...,zm  have  distribution  Fgl .  The  probability  on  the  space 
of  zi,...,zm  will  be  denoted  by  P ,  with  the  dependence  on  j, 9q,  and  9\  suppressed.  For 


the  most  part  we  consider  a  scalar  parameter  9 ,  but  with  some  technical  complications  the 
methods  described  below  are  generally  valid. 

Several  writers,  e.g.,  Davies  (1977),  Siegmund  (1986),  and  Worsley  (1986),  have  observed 
that  one  can  extend  the  likelihood  ratio  method  (ii)  of  Section  2  to  obtain  a  confidence  set 
for  j  in  the  presence  of  the  unknown  nuisance  parameters  0o,9\  a s  follows.  Let  H(x)  = 
sup0 {Ox  -  0(0)},  Sn  =  xx  +  . . .  +  xn,  and 

An  =  nH(n~lSn)  +  (m  -  n)H{(m  -  n)_1(5m  -  Sn)}.  ( 12) 

The  likelihood  ratio  test  of  the  hypothesis  that  the  change— point  is  j  has  acceptance  region  of 
the  form 


A, 


max  An  -  A, 


By  sufficiency  the  conditional  probability  of  Aj  given  (S},  5m)  does  not  depend  on  90,  9X.  Hence 
if  one  chooses  k  =  k(j,$i,Z2)  so  that 


P(A,\SJ=tuSm  =  h)=l-a 

for  all  j,  then  the  set  of  values  j  which  are  accepted  by  the  test  is  a  ( 1  —  a)  100%  confidence 
set. 

It  is  not  actually  necessary  to  solve  for  k(j,£ i,fo)  in  order  to  determine  the  confidence 
set.  Given  5,  and  Sm,  A;  is  constant,  and  hence  the  confidence  set  is  most  easily  determined 
as  the  set  of  j  for  which 


P | max  An  <  (maxAnW.ISj^™  j  <  1  -  a. 


13) 


Approximations  for  this  conditional  probability  which  seem  adequate  for  many  cases  are  given 
below. 

Bayesian  credible  sets  for  the  change-point  have  been  considered  by  Smith  (1975)  and 
Raferty  and  Akman  (1986).  Although  some  numerical  computation  is  required,  the  computa- 


12 


tional  problems  are  not  particularly  onerous.  However,  the  elegant  relation  of  Section  2.  where 
any  shift  equivariant  credible  set  for  the  uniform  prior  was  also  a  confidence  set  is  no  longer 
valid.  Results  of  Stein  (1985)  lead  one  to  hope  that  a  similar  relation  is  approximately  true 
in  the  present  context;  but  because  the  likelihood  function  is  not  smooth,  a  new  argumeit  is 
required  to  make  such  a  relation  precise. 

Some  close  cousins  of  the  likelihood  ratio  confidence  set  might  also  be  considered  For 
example,  VVorsley’s  (1986)  Da  includes  j  in  the  confidence  set  if  the  likelihood  ratio  tests 
for  no  change  in  [O.j  -  1]  and  in  [j,  m]  are  both  accepted  at  significance  levels  greater  than 
1— (1— a)1^2  5S  a/2.  Alternatively.  Pettitt's  ( 1980 )  test  might  be  inverted  to  yield  a  confidence 
set  A  third  possibility  is  to  invert  the  likelihood  ratio  test  in  the  conditional  model  given  Sm. 
It  would  be  interesting  to  study  the  expected  sizes  of  these  confidence  sets  along  the  lines 
of  Section  3.  but  the  computations  will  be  substantially  more  complicated  At  present  one 
can  make  the  following  qualitative  comparisons,  (i)  If  one  considers  the  boundary  crossing 
problems  defined  by  the  likelihood  ratio  confidence  set  and  Worslevs  D „  in  the  simple  case 
of  a  normal  mean,  one  sees  that  for  a  “typical"  sample  path  Worslev  s  D  ■,  is  more  likely  to 
include  values  of  ;  far  from  the  true  one  and  less  likely  to  include  close  bv  values  ( ii  I  Pettitt  's 
test  presumably  gives  smaller  confidence  sets  than  the  likelihood  ratio  test  for  values  of  j  near 
m/2  and  larger  sets  for  values  of  j  near  0  and  m.  See  James.  James,  and  Siegmund  (  19871  for 
related  results  about  the  power  of  the  tests.  An  objection  to  the  use  of  Pettitt's  test  is  that 
for  values  of  j  not  close  to  m/2  the  two  factors  in  the  relevant  probability  i  cf  i  14)  below'  are 
quite  unequal  with  the  result  that  the  confidence  sets  are  biased  in  the  direction  of  m  2  and 
hence  give  the  impression  that  the  change-point  is  closer  to  m/2  than  is  actually  the  case 

Given  (S;,Sm)  the  random  variables  maxn<J  A„  and  max,<rt(rT1  An  are  conditional!)  m 
dependent,  and  hence  the  left  hand  side  of  (  13)  is  of  the  form 

P  (  max  A„  <  al.Sj,.S'm  J  /’  (  max  A„  £  a  s  s-rl  )  11 

These  two  probabilities  present  similar  computational  problems,  so  it  sulfite*  to  consider  the 
second  one.  or  equivalently 


P  max  A„  >  a|5';,  5, 

1 j<n<m 


I  15) 


In  order  to  evaluate  ( 15)  Worsley  ( 1986)  in  the  special  case  of  exponentially  distributed 
observations  uses  repeated  numerical  integration,  and  Siegmund  ( 1986)  in  the  case  of  normal 
observations  with  known  variance  gives  an  asymptotic  approximation.  Related  approximations 
representing  different  compromises  between  accruacy  and  simplicity  are  suggested  below 

Suppose  initially  that  Fg  is  a  d-variate  normal  distribution  with  mean  vector  6  and 
identity  covariance  matrix.  The  case  of  an  arbitrary,  known  covariance  matrix  is  easily  reduced 
to  this  one.  Then  the  probability  ( 15)  equals 


<  max 

l  <'»<’ 


nSm/m  -  Sn  [ 
2n(  1  -  n/m) 


>  a  J.5m/m  -  S,  =  ‘ 


for  which  Siegmund  I  1986)  in  the  case  d  =  1  gives  an  approximation  when  j.a  and  |  £  j  are 
proportional  to  m  and 

«■ 2  =  2a-  ||  £  ||2  /;( 1  -  j/m)  i\7< 


is  a  positive  multiple  of  m  is  m  -  x  A  generalization  of  that  argument  shows  that  i  16  i  is 


1 1  r  r2  ||  {  ||  i  )|<</3^[c2y /(  m  ||  {  ||  )  +  ||  {  i|  /J(  1  -  j  /  ni  )•  expi  c *  2  l . 


where  v  is  defined  in  (4)  and  given  approximately  by  (5i  Appendix  B  gives  a  version  of  i  is 
for  exponential  families 

One  can  obtain  a  simpler  and  quite  general  approximation  bv  means  of  weak  convergent e 
arguments  to  replace  the  likelihood  ratio  process  \n  bv  1  /f,,i  f  i  i  It  i  fi  where  ;  -  n  m 
an<l  /in  is  a  d  dimensional  Brownian  bridge  superimposed  on  a  triangular  drift  1  hi'  approach 
leads  *o  ‘  1  **  i  wit  ti  i>  ~  1  Mthough  the  approximation  is  quite  •  onservatice  itsMinpIniU  and 
generalitv  make  it  useful  in  complicated  cases 

One  obtains  a  different  m  m  pin  at  mn  of  is.  h\  assuming  'hat  <-  .n  1  *  ~ati-fie>>  .  •  x 

and  c*  »r i  -  II  I  hen  l  s  ■  l* 

;  ‘  ii  ■  'ft  ex  pi  ■  ‘  1 '» 

1 


This  approximation  has  the  disadvantage  that  it  does  not  depend  on  d.  We  shall  see  its 
advantages  below  From  the  simulations  in  Table  6  of  Siegmund  (1986)  in  the  rase  d  -  1 
one  can  see  that  (19)  is  reasonably  accurate  for  the  range  of  j.m.  and  £  considered  there 
Presumably  it  is  less  accurate  for  larger  r.  smaller  m,  and/or  larger  d,  but  it  seems  more  than 
adequate  for  many  cases  of  interest 

For  smooth  exponential  families  the  approximation  (  19)  takes  the  form 

P{  max  An  >  a|5,,  Sm  >  ~  t/*expj-(a  -  A, )].  (201 

l  ;<n<m  J 

where  a  -  A;  is  assumed  small  compared  to  m  and  i/*  is  a  distribution  dependent  quantity 
whose  exact  definition  is  given  in  Appendix  B  A  detailed  example  involving  the  exponential 
distribution  is  discussed  below 

In  the  normal  case,  according  to  the  approximations  (19)  and  (5)  the  confidence  set 
defined  by  (  13)  is  the  set  of  all  i  such  that 

|  1  -  exp(  -  583(2A,/{i(  1  -  i/m)}]1/J  -  (max  An  -  A, ))  j  <  1  -  o.  <21  i 

Fven  when  one  questions  the  accuracy  of  (  19)  or  when  the  data  are  not  normal,  the  central  limit 
theorem  suggests  the  use  of  (21  I  as  a  first  approximation.  A  better  approximation,  simulation, 
or  numerical  methods  can  be  used  to  decide  whether  values  of  i  on  the  boderline  according  to 
i  21  )  should  be  included  in  or  excluded  from  the  confidence  set. 

Note  also  the  formal  similarity  between  (  19  I  and  i  6  i  To  the  extent  t  hat  f  ii  1  i  m  > } 1  *’  is 
nearl>  constant  over  the  values  i  of  interest,  e  g.,  when  the  likelihood  ratio  statistic  is  sharpiv 
peaked  and  hence  the  confidence  set  is  sniall.  i  2 1  <  shows  that  the  confidence  set  consists  . >t 
those  i  for  which  A,  is  within  some  distance  of  maxn  \„.  which  can  be  displayed  grapliu  ;uh 
a*  in  Section  2 

figure  2  shows  the  log  likelihood  ratio  statistn  and  the  approximate  ■  utolf  lor  .*  ’ 

conhdenie  set  for  the  same  simulated  data  as  in  figure  1  Quulit  at  i  v  eh  the  .uses  snowi. 
and  unknown  <*  look  quite  similar  I  suallv  t  he  confideni  set  ,s  larger  n  'he  >  use  of  inktiown 


<*  and  t  his  is  indeed 


111  *  fl»*  lower  plot  However  'fie  reverse  s  true  It  'll*'  Ipper  [not 


presumably  bwaus*1  the  procedure  in  effect  estimates  f>  ami  then  acts  a*  if  the.  ,n  this  rase 
large,  estimated  value  is  the  true  one 

As  an  illustration  we  consider  the  British  coal  mining  accident  data  of  Maguire.  Pearson 
and  Wynn  i  1952),  as  extended  and  corrected  by  Jarrett  i  1979)  Worsley  i  19*fit  has  analysed 
the  original  data  and  determined  the  likelihood  ratio  confidence  set  by  numerical  computation 
of ■ 14  i 

I  he  data  are  intervals  in  days  between  accidents  in  British  coal  mines  in  which  at  least 
ten  deaths  occurred  Jarrett  s  i  1979)  data  involve  m  =  190  intervals  from  15  March.  1*51  to 
22  March.  1902.  a  period  of  40,549  days.  I  nder  the  assumption  that  the  intervals  y\.  .  ym 

are  independent  and  exponentially  distributed  with  a  change  after  the  j  th  observation  in  the 
mean  time  between  accidents  we  shall  determine  a  likelihood  ratio  confidence  set  for  j 

I  he  likelihood  ratio  statistic  is  maxn  A„  -  maxn t m  |og(  If  m  /  m  i  -  nlog(H’n/ni  -  mi  - 
ni|c*gl(Vfm  H,)/lm  -  n  i }  j ,  where  Hn  =  y,  ♦  -t-  yn  lor  Jarrett  s  data  the  maximum 

value  equals  45  b  and  is  assumed  at  n  =  124  in  the  year  l* 90  According,  to  i  B  l.li  i  B  15*  in 
\  ppendix  B  the  likelihood  ratio  confidence  set  is  given  approximately  as  the  set  of  all  j  such 
that 

1  -  r*<  \  i  V;  i  exp|  -  i  max  \n  \.  I }  1  -  i  X;  /  A,  i  exp|  -i  tnax  \n  -  \  ,  i }  1  -  o.  i  22 

n  n  ~ 

a  here  V ,  ^  ■  VI  j  r  1  -  I  VI  ^  VV  i/ 1  m  -  ;  i  ■  '  1 .  and  r*i  ms  defined  in  i  B  1  4  >  ■  I’ he  use  o! 

21  assumes  that  X]  •  X^  if  X;  >  X,.  then  the  values  of  X,  and  X#)  should  be  interchanged 
I  he  approximation  22  gives  the  set  (lib.  1  i  7 .  .  12*.  14.4}  as  a  95'4  confidence  set  for  the 

•  hange  point  I  his  >  or  res  ponds  t<i  the  interval  from  1**7  to  1*94  together  with  an  isolated 
point  in  :*9~ 

Xpplnation  of  22  >  to  the  original  Maguire  Pearson  and  Wuin  1952  data  gives  pre 
ise|v  'he  same  •  olifidem  e  set  which  NNorsiev  ■  omputed  numentallv  However  tie«  ailse  ol 
dlsi  repalli  ies  tie  T  w  eefi  1  tie  two  data  sets  t  tie  years  ■  overed  t'V  t  tie  'wo  .  o|ltldMi' •  set,  are 
sllgtltlv  different 

Kalertv  anti  Xkmari  19*h>  give  a  fiat  prior  Bavesian  aliaivsis  o|  'ties#-  data  It  appeal' 
from  t  heir  i  aii  ulat  ions  ami  figure  that  a  highest  posterior  set  estimate  tor  ’tie  hange  point  - 


. 


essentially  the  same  as  the  confidence  set  computed  here  Actually  Raferty  and  Akman  consider 
a  continuous  time  Poisson  process  model  and  hence  allow  the  change  to  occur  between  event 
times  It  is  a  straightforward  matter  to  adapt  the  theory  developed  here  to  allow  for  that 
possibility 

Cobb  (  1978)  has  suggested  an  extension  of  method  (iv)  in  Section  2  to  deal  with  nuisance 
parameters,  but  it  contains  some  arbitrary  features  which  may  make  it  difficult  to  implement 
with  small  or  moderate  sample  sizes.  An  interesting  variant  of  Cobb’s  analysis  has  recently 
been  proposed  by  Hinkley  and  Schechtman  (1987). 


'■>  Joint  Confidence  bets 

The  likelihood  ratio  method  can  also  be  adapted  to  give  joint  confidence  sets  for  the 
change-point  )  and  some  function  f>  of  the  parameters  90  and  9\  We  begin  with  the  simple 
i  ase  that  the  x,  are  normally  distributed  with  mean  0O  or  9\  according  as  1  <  i  <  j  or 
j  <  i  S  m  and  identity  covariance  matrix,  and  take  (i  =  $x  -  0O 

I  he  acceptance  region  of  the  likelihood  ratio  test  that  the  parameters  are  j  and  is 

-  j*'ip  V  *  '  jSm/m  -  S',  i  -  j{  1  -  jlm  )  ll  I*  ||;  !'z\  <  c*'/2  J> 

where  \,  =.|  tSm;rn  -  S,  ;|^  (2il  1  -  i/m)}  and  c  =  rt  j.  A )  is  chosen  to  satisfy 

Pi  A,.,  i-l-o 


for  all  )  Note  t  hat 


mi  p  \ ,  f1  ■  )  *>  ,  -'i  V  .  i  -  j  i  1  j  >  m  i  i  i 1  2 

-  slip  \,  \  ♦  ;  ,  rri  S  j  i  I  j  rn  iC  ‘  2  j 


:  t 


and  ->  1 1 1 1  ♦*  'tie  first  'lifferetne  mi  t  he  right  hand  -ide  is  neiessarily  non  nr|[,i!iii'  mu  -iM.iii,' 


/h  \ . .  i  -  f  (  j  *>„,  to  s  :  j  rn  •  c  j,  ; 
t  P<  V  .  t  S",  " I  s  1  Is.,,'"  s  /<  1  }  "I  f  ■  >'  ( 


2  l 


- -  V.-» .y -S 


The  first  term  on  the  right  hand  side  of  (24)  is  exactly  1  -  Fd(c2),  where  Fd  denotes  the  \2 
distribution  with  d  degrees  of  freedom.  According  to  (19) 


P(A]'6\jSmlm-S,  =0 

~  2i/[  ||  i  ||  /{;( 1  -  j/m)}]exp  {  -  c2/ 2+  ||  {  -  j(l  -  j/m)6  ||2  /[2j(  1  -  j/m)]}. 

provided  the  exponent  is  small  '-ompared  to  m  as  m  —  no.  Substitution  of  this  approximation 
into  (24)  yields  (if  c2  =  o(m)) 

/*U'4)a  1  -  Fd(c2)  +  v(6)cd  exp(-c2 /2)/ {2d'2~'r(d/2  +  1)}.  (25) 

If  instead  of  ( 19)  one  uses  in  (24)  the  presumably  more  accurate  approximation  ( IS),  the 
integration  must  be  performed  numerically. 

Using  (25)  one  can  easily  find  an  approximate  confidence  set  by  trial  and  error. 

An  extension  of  this  method  to  non-normal  exponential  families  requires  a  consideration 
of  special  cases,  depending  on  the  parameter  6  of  interest.  The  generalization  of  (23).  in  an 
almost  obvious  notation  is 

sup  A,  -  =  (sup  A,  -  A;)  +  A J  -  A*4). 

I  « 

If  <•>  is  a  function  of  the  difference  between  the  natural  parameters  of  the  exponential  family,  e  g. 
if  the  parent  populations  are  Poisson  and  6  is  the  ratio  of  their  means,  one  obtains  distributions 
parameterized  bv  t<  by  computing  probabilities  conditionally,  given  Sm.  On  the  other  hand, 
if  the  parent  distributions  are  exponential  and  (>  is  again  the  ratio  of  their  means,  considera¬ 
tions  of  invariance  of  the  two  sample  problem  under  scale  changes  shows  that  unconditional 
probabilities  are  appropriate  In  either  case,  using  a  \2  approximation  to  the  distribution  of 
\  -  \'  1  in  conjunction  with  i  20 1.  one  obtains  an  approximation  similar  to  ( 25  ).  but  with  t  he 
r*  appropriate  to  the  distribution  under  consideration  in  place  of  i>  In  large  samples  one  may 
consider  replacing  vm  by  v.  but  some  thought  must  be  given  to  the  choice  of  argument  of  the 
function  r 

for  the  special  rase  of  exponentially  distributed  y's  having  mean  A-1  and  A  =  A|/A().  one 


obtains  from  (B.13)-(B.15)  when  6  >  1 


P(ACj,s)  *  2[1  -  *(c)]  +  2 [«/•(«)  +  6-l]cV(c),  (26) 

where  i/m  is  defined  in  (B.14).  When  6  <  1  (26)  holds  with  5-1  in  place  of  6. 

Table  3  gives  an  approximate  90%  joint  confidence  set  for  j  and  6  =  Ai/A2  for  the  British 
coal  mining  data. 


Table  3 

90%  Confidence  set  for  (j,  S) 


j 

S 

j  ‘ 

6 

115 

(2.7,  3.9) 

124 

(2.3,  5.3) 

116 

(2.5,  4.3) 

125 

(2.4,  4.9) 

117 

(2.4,  4.6) 

126 

(2.4,  5.0) 

118 

(2.3,  4.8) 

127 

(2.4,  4.7) 

119 

(2.4,  4.5) 

128 

(2.5,  4.4) 

120 

(2.5,  4.3) 

129 

(2.7,  4.0) 

121 

(2.4,  4.7) 

130 

(2.9,  3.6) 

122 

(2.4,  4.7) 

132 

(2.8,  3.9) 

123 

(2.3,  4.9) 

133 

(2.5,  4.5) 

Acknowledgements 


I  would  like  to  thank  M.  Poliak  and  T.  Sellke  for  very  helpful  discussions.  I.  Einott  for 
some  programming  assistance,  and  Qiwei  Yao  for  spotting  the  transcription  error  in  Siegmund 
( 1986),  which  is  mentioned  in  Section  2. 


APPENDIX  A 


Informal  Proof  of  Theorem  1. 

We  consider  only  the  confidence  interval  [Z,.ff].  The  proof  for  the  likelihood  ratio  confi¬ 
dence  set  is  similar  and  somewhat  simpler.  Since  the  confidence  set  is  equivariant,  it  suffices 
to  consider  the  case  j  =  0.  To  simplify  the  notation  we  shall  write  P  and  E  instead  of  Po  and 
Eo ,  t)  instead  of  r/',  Sn  instead  of  Sn,  and  take  6  =  1.  Recall  that  M  =  supn>0  Sn. 

For  arbitrary  n0  =  1,2,... 


E(R  -  L)  =  ^P(Z;  <  n  <  R)  =  F(X  <  0  <  R)  +  2^P(L  <  n  <  R) 

-OO  1 

oo 

=  P(L  <  0  <  R)  +  2  £{/>(*  >  »)  -  P{L  >  »)} 

1 

OO 

=  1  +  2^P(iZ  >  n)  +  o(l)  as  q  — ►  oo 
x 

00  no 

=  1  +  2no  +  2  £  P(R  >  n)  -  2  £{1  -  P(R  >  n)}  +  <*1). 


no  +  l 


For  positive  n,  by  the  definition  of  R 


(.41) 


P(R  >  n)  =  P  (  sup  Si  <  sup  Si  +  77 ) 

\«<n  *>n  / 

=  /  /  P  lsn  £  d£,max(5,  -  5n)  £  dy\ 

J  J  [— »j,0)x[0,oo)  l  *>"  J 

x  P  [  max  Si  <  77  +  £  +  y\Sn  =  M  P  (  max  5,  <  77  +  £  +  y  ) 

\0<i<n  /  \  t<0  / 

=  /  /  P(Sn  £  dt)P(M  €  dy)P  (max  Si  <  tj  +  {  +  y\Sn  =  A  P(M  <  77  +  * 

J  J  [— »?,0)x(0,oo)  \0<*<"  / 

=  /  /  P{Sn  £  -T]  +  dx)P{M  £  dy)P  (  max  S,  <  x  +  y\Sn  =  x  -  ri)  P(  M  < 

J  J  [0,oo)x[0,oo)  \°<*<n  / 


s'  +'J 


**■ 6  + ■  *  {^::y;o/2}  <"» + 


si 


a 


iH 


a 


a 


a 


It  may  be  shown  that  the  contribution  to  the  two  series  in  (Al)  from  values  of  x  and  k  outside 
the  range  |fc|  <  r?2/3, \x  +  k/2\  <  t?2/3  is  negligible,  and  inside  this  range 


P  Lj5ajc  u  Si  >  1  +  ^l5"o+*  =  ~V  +  x  )  -  P(M  >  x  +  y) 

\0<*<no'f*  J 


converges  uniformly  to  0.  Hence  for  the  purpose  of  evaluating  (Al)  asymptotically,  P(R  > 
no  +  k)  may  be  replaced  by 


Jo  So  ^{(no  +  fc)^2}  1/2 dx  P(M  e  dy){l-2P{M  >  x  +  y)  +  P2(M  >  x  +  y)}. 


For  k  =  0  this  integral  converges  to  1/2.  The  terms  in  (Al)  for  k  =  ±1,  ±2, . . .  may  be  paired, 


and  after  some  calculation  one  obtains 


P{R  >  n0  +  k)  -  £{1  -  P{R  >  n0  +  it)}  =  -1/2 
k>  1  fc<0 

+  ^[${2-1fc/(n0  -  k )1'2}  -  ${2-Ifc/(n0  +  k)1'2} 


•  foo  r  oo 

-  2n-l/V(2-VnJ/2)  /  /  P(M  €  dy){2P(M  >  x  +  y) 

yo  yo 


-  P2(A/  >  x  +  y)}dx  +  o(l). 


A  Taylor  series  expansion,  approximation  of  Riemann  sums  by  integrals,  and  substitution  of 
the  result  back  into  (Al)  complete  the  informal  proof  of  Theorem  1. 


■f  ■  s' •jc  •/ ».<  cf -lyyv  o  At'iy&'V’" 


APPENDIX  B 


This  appendix  is  concerned  with  approximations  to  boundary  crossing  probabilities  like 

(15). 

Let  x\,xi, . . . ,  be  independent  random  variables  with  probability  density  function  of  the 

form 

fe(x)  =  exp{0x  -  t/>(0)}/o(x), 

where  /o  is  without  loss  of  generality  standardized  to  have  mean  0  and  variance  1.  We  shall 
consider  only  the  case  of  real  x  and  9,  although  the  extension  to  the  multivariate  case  is 
straightforward.  However,  the  case  of  vector  9  in  which  some  components  change  at  j  while 
others  are  assumed  to  remain  fixed  is  substantially  more  difficult.  See  James,  James,  and 
Siegmund  (1986)  for  the  special  case  of  normal  observations  whose  mean  changes  while  the 
unknown  variance  does  not.  One  can  also  handle  discrete  random  variables  and  continuous 
time  Poisson  process.  Some  remarks  about  the  necessary  modification  in  the  argument  are 
given  below. 

Let  Sn  =  xi  +  . . .  +  xn,H(x)  =  sup9{0x  -  i>(9)},  A„(f,  77)  =  nH(r]/n)  +  (m  -  n)H[(S  - 
n)/(m-  n)j  -  mH(Z/m),  An(f)  =  An(£,S„),  and  define 

T  =  inf{n  :  n  >  m0,  An(f)  >  a}  (£.1) 

(=  00  if  An(£)  <  a  for  all  mo  <  n  <  m).  Also  put  n  =  t//(0).  We  shall  write  to  denote 
dependence  of  probabilities  on  the  parameter  9.  For  events  A  defined  in  terms  of  xi, . .  .,xm 
let 

P<m)(A)  =  PM\Sm  =  0- 

By  sufficiency  this  probability  does  not  depend  on  /x.  Theorem  B.l  gives  approximations  for 
P\m)(T  <  m|Smo  =  V). 

In  order  to  describe  those  approximations  let  9  —  (9(x)  be  defined  by  xl>'(9)  =  x,  so 
H(x)  =  9x  —  i>(9).  Note  that  H'(x)  =  9  and  xH'(x)  -  H(x)  =  i^(9).  For  ^  hi  let 
9\  =  9{\x  1)  and  9 2  =  9(n 2),  and  define 

r  =  inf  {n  :  (9i  -  92)Sn  -  n[^(^,)  -  H92))  >  6}. 


23 


Also  let 


"’(Mi, M2)  =  lim  E Ml  exp  (  -  {(0X  -  92)ST  -  t[^(6x)  -  0(02)]  ~  &})• 

0— ►CO 


The  limit  indicated  in  (B.2)  exists  as  a  consequence  of  the  renewal  theorem.  A  general  method 
for  computing  u’  numerically  has  been  given  by  Woodroofe  (1979).  In  the  special  case  of 
normal  x’s  "’(mi,M2)  =  "(|mi  —  M2I),  where  v  is  defined  in  (4)  and  given  approximately  by  (5). 
The  case  of  exponentially  distributed  x’s  is  discussed  below. 

Theorem  B.l.  Assume  for  fixed  0  <  t0  <  l,a0  >  0,fo,  and  tjq  ^  f0<o  that  m0  ~  rnt0,a  ~ 
ma0,£  ~  mfo,  and  77  ~  771770.  Let  tm  be  defined  by 

(WM  +  (1  -  tm)H{(£0  -  %<’/<o)/(l  -  <*)}  -  J5T(&)  =  ao, 

and  assume  that  t0  <  t"  <  1.  Then  as  m  —►  00,  for  T  defined  by  (B.l) 

P\m){T  <  m|5mo  =  T)}  ~  exp  {  —  [a  -  Amo(£,7?)]} 

f  (1  -  to)r/r'[((g  -  *,r/io)/(i  -  n) ) 1/2  ,  -r^eitUH-rw 

where  v *  is  defined  in  (B.2). 

A  simpler  approximation  to  P^m){T  <  m\Smo  =  77}  is  obtained  by  assuming 

00  < — a- \mo(£,T))  =  o(m).  (5.4) 

Id  this  case  t "  =  to,  so  (B.3)  becomes 

P[m){T  <  m|Smo  =  77}  ~  "*(W*o,(£o  ~  %)/(!  -  to))exp{-[a  -  Amo(^, »?)]}-  (5.5) 

Complete  proofs  of  (B.3)  and  (B.5)  are  quite  long  and  technical.  The  main  idea  and  some 
important  lemmas  are  given  here.  The  method  is  inspired  by  that  of  Lai  and  Siegmund  (1977, 
Section  3),  but  it  differs  in  several  crucial  ways. 

Let  /„  denote  the  n-fold  convolution  of  fo(n  =  1,2,...)  and  assume  that  fn  has  an 
integrable  characteristic  function  for  some  n.  The  following  large  deviation  approximation  for 
/„  is  used  repeatedly.  (See  Borovkov  and  Rogozin,  1965.)  As  n  — >  00 

fn(nx)  ~  [5"(x)/27rn]1/,2exp[-n5(x)].  (5.6) 


Let  Q  =  JTooP^dg K 2x)1!2,  and  let  Ln  denote  the  likelihood  ratio  of  xi,...,xn  under  Q 
relative  to  P^  (n  =  1, 2, . . . ,  m  -  1).  Then 

Ln  =  [/m-ntf  -  Sn)//m(0]_1  /“  ~  Sn)/ fm(?)]d? /(2*)1'2.  (B. 7) 

J — oo 

The  following  representation  is  basic. 

Lemma  B.l.  P^{T  <  m|Smo  =  tj} 

■  7 dfc£  i*rV<.>i*»  -  «*-8> 


Proof.  By  Wald’s  likelihood  ratio  identity  and  the  definition  of  Q 


L?  dQ 


Pim){* 1  <Smo<r,  +  6,T<m}=[  L?1  dQ 

J <»?+5,  T<m} 

=  I  Lj'dP^dt V(27T)1/2 

J-03  J{rt<Sm<)<n+i,  T<m} 

=  r  r+S  Ep]  [L^l{r<m}|5mo  =  :/]  4m)(5mo  6  dV')dt'/(2x)'/2. 

J — OO  •/») 


— oo  •'{rf<Smi)<n+6,  T<m } 
/■oo  ^7*+$  ,  v 


dP{r]  dt 7(2ff)1/2 


The  desired  representation  follows  by  dividing  by 


/  \  /•*>+* 

Pf  (*  <Smo<r1  +  6)=  {fmoivVm-moit  ~  V)//m(OW 


and  letting  6  — ►  0. 

It  follows  from  (B.6)  that 

_ fmiQrn  f  27rff"({/m)m(m  -  m0)  )  1/2 

fm0(n)fm-mo(Z  -  Tl)mo  ~  \H"(Tj/m0)H"l(Z-n)/(m-mo)}moi  e 

The  measure  {m0/mo(T7)/m-m0(^'  ~  V)/mfm{^)}dC  behaves  asymptotically  like  a  normal  dis¬ 
tribution  with  mean  mTjo/to  and  standard  deviation  proportional  to  m1/2,  i.e.  like  a  Dirac 
delta  function  at  mrjo/to-  Hence  the  right  hand  side  of  (B.8)  is  asymptotic  to 


exp[Amo(£,v)]  j-^77 


mg"(6))(l-to) 

(W*o)tf"[(6>  -  »to)/(l  -  fo)]fc 


xEmwJ£?V<m»}l5»*o  =  »>]-  (5.9) 


The  following  lemma  is  useful  in  approximating  the  conditional  expectation  in  (B.9). 


Lemma  B.2.  For  n  proportional  to  m  as  m  -mm  except  for  an  event  of  negligibly  small 
probability  under  P^(-|Smo  =  t?) 

r  r.  (  H"(Zo)m(m  -  n)  1  1/2 

n  -  exp[y  n(0]  {  ^l/[{&  _  %n/mo)/(1  -  n/mp«(jto/,o)n }  •  (  -10) 

The  proof  of  Lemma  B.2  follows  from  substitution  of  (B.6)  into  (B.7),  the  observation  that 
mH(£'/m)  —  (m  —  7i)/7[(£'  —  5n)/(m  -  n)]  is  maximized  at  £'  =  mSn/n,  where  it  equals 
nH(Sn/n),  and  a  Laplace  type  asymptotic  expansion  of  the  integral  in  (B.7). 

The  Hajek-Renyi-Chow  inequality  applied  to  the  -martingale  (Sn  —  nr]o/t0)/(l  - 

n/m)  shows  that  for  any  0  <  e  <  1 

Pv/tl  {I5**  “  nT*>/*o|  >  A  +  ne  for  some  m0  <  n  <  m(  1  -  £)|Smo  =  r/} 
can  be  made  arbitrarily  small  by  taking  A  sufficiently  large,  and  hence 
m-1T  — ‘  t"  in  =  V)  ~  probability. 


It  follows  from  (B.10)  that 


ir~exp[AT({)]{ff1(fo_  J.' 


7(£oMl  ~  <*) _ 

/t0)/(l -<*)]#"(  V<o)** 


(5.11) 


except  for  an  event  of  negligibly  small  probability  under  P^|(-|Smo  =  tj). 


The  proof  of  Theorem  B.l  can  be  completed  by  substituting  (B.ll)  into  (B.9)  and  ap¬ 
pealing  to  Hu’s  (1987)  conditional  nonlinear  renewal  theorem,  which  says  that  the  distribution 
of  the  excess  over  the  boundary,  A j(f)  -  a,  under  the  conditional  probability  has  the 

same  limit  as  a  suitable  random  walk  approximation  to  A„(£)  under  the  unconditional  proba¬ 
bility  Prto/t0-  See  Siegmund  (1986,  Appendix  2)  for  an  intuitive  discussion  of  nonlinear  renewal 
theory. 

Now  assume  that  t/i,  1/2,  •  ■ ,  2/m  are  independently  and  exponentially  distributed  with 
mean  A-1.  Let  Wn  =  yi  + . . .  +  yn  and  Sn  —  n  -  Wn.  Then  9  =  A  -  1  and  tp(9)  =  9-  log(  1  +  6). 
Let  Ai  =  (W^/mo)-1,  A2  =  [(VFm  -  Wm0)/(m  ~  ^o)]-1  and  assume  that  Aj  >  A2.  (The  case 
Ai  <  A2  is  similar.)  Assuming  that  (B.4)  holds  one  can  use  the  lack  of  memory  property  of 


the  exponential  distribution  to  obtain 


P{  max  An  >  a|VV,rno.  H-'m}  ~  A,/Ai)exp[-ia  -  c.  ■  H  !  t 

mo  <n<  m 

where  in  this  case 

*'*(<’>)  =  i  log(<*> )/((>  -  i )  -  i fi i/i  i  -  f ' 1 1  -  r  a  :  i 

Similarly 

P  <  max  An  >  a|Wm„.  Wm  i  ~  A 2 A “ 1  expi-ia  -  Amo  n  ■  /Ml 

^  n  <  mo  J 

The  details  of  these  evaluations  are  omitted. 

With  minor  modifications  the  methods  developed  here  yield  likelihood  ratio  confidence 
sets  for  a  change-point  in  the  intensity  of  a  continuously  observed  Poisson  process  They  also 
apply  to  many  discrete  exponential  families,  even  though  the  nonlinear  renewal  theorem  used 
in  the  proof  of  Theorem  B.l  requires  that  certain  distributions  be  non  arithmetic  However, 
these  are  the  distributions  of  the  P^/t<i  -random  walk  -  02 )  -Vn  -  n[vi<>i  -  udjlj.  which 
usually  are  non-arithmetic  for  all  but  countably  many  values  of  and  t(). 


Reference 


Borovkov  A  A  ami  Rogozin.  R  \  ■  1  '*<•»-'»  i  On  the  multuliiiu-n.Mon.il  -entra!  limit  tlu-ofi-m 
I  htory  Pryibab  Applic  10  5.5  62 

*  obb.ti  \\  197*c  I'he  problem  of  t he  Nile  conditional  solution  to  a  •  hange  point  problem 

Biometnka  62.  243  51 

Davies,  R  B  i  1977).  Hypothesis  testing  when  a  nuisanre  parameter  is  present  only  iimier  the 
alternative.  Biometnka  84.  247  54 

fisher  R  A  i  1034 1.  Two  new  properties  of  mathematical  likelihood.  Pmc  R  Sot-  A  144. 
2*5  307 

Poliak.  Nl.  and  Siegmund.  D  i  19*5)  A  diffusion  process  and  its  application  to  detecting  a 
change  in  the  drift  of  Brownian  motion.  Biometnka  72.  267  so 

Binkley.  D  V.  (1970).  Inference  about  the  change  point  in  a  sequence  of  random  variables. 
Biometnka  57  1-17 

Hinklev  D  S’  and  Schechtman.  E  ( 19H7  i.  Conditional  bootstrap  methods  in  the  mean  shift 
model  Biometnka  74.  ’45-94 

Hinklev.  D  V  i  1972).  Time  ordered  classification.  Biometnka  59.  509-523. 

Hooper.  P  M.  i  19M2).  Invariant  confidence  sets  with  smallest  expected  measure.  4  mi.  Statist. 
10  12*3  94 

Hora.  R  B  and  Buehler.  R.  J  i  1966).  f  iducial  theory  and  invariant  estimation.  Inn.  Math 
Statist  37,  643-656. 

Hu.  1  <  19*7  i  Nonlinear  renewal  theory  for  conditional  random  walks  l  mversitv  ot  Pentisyl 
vaiiia  Preprint 

Ibragimov.  1  \  aiul  Khasminski.  R  /  <19,*li  Stntistiral  h.stimation  [symptntie  1  hton/. 

springer  Verlag.  New  York  Heidelberg  Berlin 


Jam***  B  James,  k  1  ,  ami  Siegmund.  I>  i  9*6 ,  c  onditional  Soundam  ■  ro**mg  ;>r .  >t.  a  f  •* .  ’ 


with  application*  to  change  point  problem*  Stanford  I  mversitv  Iectmic.il  Repor' 

James  B  James  K  I  .  ami  Siegmund.  I)  1  ■  lests  lor  a  change  point  Boone  tr  iku  7  l 

'1  >*4 

Jarrett.  R  (■  c  1979i  A  note  <>n  the  intervals  between  coal  mining  disaster*  Boone  I  fiAci  66 
1 9  I  3 

Lai,  L  L  and  Siegmund .  I)  i  1977  i.  A  nonlinear  renewal  theorem  with  applications  tn  *ecpien 
tial  analysis  I.  Arm.  Statist  5.  946-954 

Maguire  B  A  .  Pearson.  F  S  .  and  Wynn.  A  H  A  i  1952t  I  he  time  interval*  between 
industrial  accidents.  Biometnka  38.  168-HQ 

Pettitt.  A  N  119801  A  simple  cumulative  sum  type  statistic  for  the  change  point  problem 
with  zero  one  observations  Biometnka  87  79  84 

Rafertv.  A.  E  and  Akman.  V.  E.  (  1986)  Bayesian  analysis  of  a  Poisson  process  with  a  c  hange 
point  Ihometrxka  73  85-90. 

Siegmund,  L)  (1985)  Sequential  Analysis  Test  and  <  onfidence  Intervals.  Springer  \erlag 
New  York  Heidelberg  Berlin 

Siegmund,  D.  ( 1986).  Boundary  crossing  probabilities  and  statistical  applications.  Inn  statist 
14.  361  404. 

Smith,  A.  F.  \f.  (1975)  A  Bayesian  approach  to  inference  about  a  c  hange  point  m  a  sequence 
of  random  variables,  Biometnka  62.  407  416 

Stein,  C.  (  1985).  On  the  coverage  probability  of  confidence  set*  based  on  a  prior  distribution 
Banach  Center  Publications  lb,  485  514 

Stein,  C.  (1965).  Approximation  of  improper  prior  measure*  bv  prior  probabilif.  . . 

Bernoulli.  Bayer.  Laplace.  Anniversary  1  olume.  .1  Neyman  and  1  M  I  <■  <am.  e.|> 
Springer- Verlag.  New  York  Heidelberg  Berlin.  '217  240 

29 


»»*«»**  €l  A»»iPlC  aYi<j.  «P  T*1»  D«a  r«>iWI 


•to 


REPORT  DOCUMENTATION  PAGE 


READ  WITRUCTIOH* 

•crouc  compuctwc  row* 


A  TlTkl  raw.  1 MNN«| 


'st idencf  sets  in  hance-pi'Int  problems 


Tei  hnu  al  Re  yurt 


•  »t*ro«M»MO  OMO 


AwTMO«r*> 


omthact  o  a  on  am  ▼  auMoenrx 


David  5  u- ground 


N0001  -.-8  —  K — •  )<J  '3 


pcnpommim*  omcaniiatiom  mamc  amo  aoomcm 

Department  >f  Statist  ;s  -  Sequoia  Hall 

Stanford  L’niversitv 

Stanford,  California  (05-..065 


<0  nnoooAM  ClCmCnt  mojICt  t<m 
AM  A  A  «OM  UNIT  NUM.CH1 


NK-04J-  17  1 


H  CMTMLUMO  OP  Pico  MAM*  AMO  AOOMIM 


Statistics  &  Probability  Program  Code  i~»li  <SP)j 
office  of  Naval  Research 
Arlington,  Virgina  22217 


i*.  nsnont  OAT* 

Mav  1987 


u  MUM*cn  or  paoks 

30pp. 


I  hmm  C44M/IAM  Otftn)  I  ».  S*C U Ml T Y  CLASS,  (ml  MH»  tmrnmn) 


Unclassified 


a  0*CL  ASSIPlC  ATIOM/OOVMCHAOimo 
SCmCOUL* 


«•  OtSTniSUTIOM  ST  ATtMCM  T  (ml  *<•  Hmmmrl) 


Approved  for  public  release;  distribution  unlimited. 


*7.  OltTRlftuTlOM  ST  AT  CM  CM  T  (ml  thm  ahmhmmt  mimv*  In  IImA  J#.  It  Cffral  hmm  hmfmH) 


VC-  KCV  VOROI  (Cmntlmm  mn  r*f«M  •!4m  It  wian  Ay  !/•«!  nwlirj 


Change-point,  likelihood  ratio,  boundary  crossing  probabilities. 


amtract 


[  Mmuitr  Ay  IfteA  mmAkt) 


Several  methods  are  discussed  for  confidence  set  estimation  of  a  change-point  in  a  se¬ 
quence  of  independent  observations  from  completely  specified  distributions.  The  method  based 
on  the  likelihood  ratio  statistic  is  extended  to  the  case  of  independent  observations  from  an 
exponential  family.  Joint  confidence  sets  for  the  change-point  and  the  parameters  of  the  ex¬ 
ponential  family  are  also  considered. 


«  j2Tt»  M73  *imt,o«  OP  •Mov.i.iomoLm  UNCLASSIFIED 


nm*m 


iTWWTi'wm1 


wmm.  w* 


UNCLASSIFIED 


•gCUM'TV  Cl.  O*  TMt*  <>•»•  fciwW) 


WWW 


